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Abstract 

This paper argues that the operations of a 'Universal Turing Machine' 
(UTM) and equivalent mechanisms such as the 'Post Canonical System' 
(PCS) - which are widely accepted as definitions of the concept of 'com- 
puting' - may be interpreted as information compression by multiple align- 
ment, unification and search (ICMAUS). 

The motivation for this interpretation is that it suggests ways in which 
the UTM/PCS model may be augmented in a proposed new computing 
system designed to exploit the ICMAUS principles as fully as possible. 
The provision of a relatively sophisticated search mechanism in the pro- 
posed 'SP' system appears to open the door to the integration and simpli- 
fication of a range of functions including unsupervised inductive learning, 
best-match pattern recognition and information retrieval, probabilistic 
reasoning, planning and problem solving, and others. Detailed consider- 
ation of how the ICMAUS principles may be applied to these functions 
is outside the scope of this article but relevant sources are cited in this 
article. 

Key Words: Theory of computing; Turing machine; Post canonical system; 
multiple alignment; new computing paradigms. 
Category: SD F.O 



1 Introduction 

For about 60 years, it has been widely accepted that the intuitive concept of 
'computing' may be defined in terms of the operations of a 'Universal Turing 
Machine' (UTM, |Turing 36- 7| ) and concepts which are recognised as equivalent 
such as the 'Lambda Calculus' (Church and Kleene, see Tlo sser 84] ^ . 'Recur- 
sive Function' Kle ene 36] . 'Normal Algorithm' Mar kov and Nagorny 88| and 
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Post's 'Canonical System' (PCS, [Post 43| ). These and related ideas have been 
phenomenally successful, providing the theoretical basis for the design of most 
modern digital computers. 

However, notwithstanding Turing's own vision of the possibility of artificial 
intelligence |Turing 50| , the UTM and the PCS and equivalent models of com- 
puting operate in a rigid 'clockwork' manner which appears far removed from 
the fluidity of human intelligence. Likewise, ordinary digital computers, with- 
out special programming, are notoriously lacking in human-like flexibility and 
intelligence. 

Because the UTM and equivalent models, and a conventional computer in 
'raw' state, have so little functionality and so little intelligence, it is necessary 
for practical purposes to create programs which provide the missing capabili- 
ties, as far as these are understood. Although our understanding is still very 
patchy, there is increasing success with AI applications in achieving human-like 
capabilities such as speech recognition, the ability to play games like chess, go or 
bridge, the ability to recognise visual patterns and objects in a flexible manner, 
and others. 

Since computers were first invented, much effort has been devoted to the 
development of programs and associated concepts across a wide range of ap- 
plications. Many areas of application are now quite well served by specialised 
programs. However: 

• Notwithstanding the development of 'suites' of programs intended to work 
together, there are still many problems of integration between different 
kinds of application. 

• Similar concepts have emerged in diverse forms. For example, the con- 
cept of 'heuristic search' manifests itself in the form of 'hill climbing' 
(or 'descent'), 'genetic algorithm', 'simulated annealing', 'dynamic pro- 
gramming', 'neural computation', and others. Without normally being 
recognised as such, heuristic methods also appear in 'standard' computa- 
tional techniques such as the Newton-Raphson method for finding square 
roots, dynamic programming and techniques for matching patterns and 
retrieving information. 

• The diversity of programs and concepts is collectively, very complex. 
There is a need for rationalisation and simplification across diverse ar- 
eas. 

1.1 New developments 

These considerations have been the main motivation for a programme of research 
f |Wolft'90| to [Wolff 99 ) seeking new insights into the nature of computing, and 
aiming to develop a 'new generation' computing system based on these ideas. 

If we can identify the elements which are shared by diverse applications and 
abstract them from the elements which are truly specific for each application, 
this may help us to achieve an overall simplification of computing systems and 
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better integration across applications. It can mean less programming - and thus 
less cost - and there should be less opportunity to introduce errors - yielding 
gains in quality. 

As we shall see (Section[2J), the central idea in this research is the conjecture 
that "all kinds of computing and formal reasoning may usefully be understood 
as information compression by multiple alignment, unification and search (IC- 
MAUS)". This is still only a conjecture and may turn out to be wrong. But 
there is already an accumulation of supporting evidence described and discussed 
in sources cited in Section ll~2l 

The use of the word "theory" for these proposals (Section [2Jl may suggest 
that they are intended to displace established theories of computing. It is more 
accurate to say that the proposals are a framework of ideas which provide an 
interpretation for established models of computing and which suggest ways in 
which those models may be augmented to provide additional functionality which 
is generic for many applications. 

As a test-bed for these ideas and a vehicle for demonstrations, a software 
model has been developed, SP61, running on a conventional computer. This is 
described briefly in Section 12.61 and more fully in sources cited there. All the 
alignments shown in this paper are output from the model. 

1.2 Scope and orientation of this paper 

Regarding the primary motivation for these developments - to promote simplic- 
ity and integration in computing - no attempt will be made in this paper to 
demonstrate the potential of these ideas. Readers are referred to the following 
sources describing how the ideas may apply in diverse areas of computing: 1 

• Background thinking (the pervasive nature of information compression in 
human cognition and also in artificial computing): Wolff 93 . 

• Unsupervised inductive learning: 

— The entire programme of research is based on a previous programme 
of research into unsupervised inductive learning of language ( Wolff 88 
Wolff 82 and earlier articles cited in these two articles). 

— A demonstration of how an early form of the framework of ideas may 
be applied to learning is presented in |Wolff 95b| . 

— Integration of unsupervised inductive learning and different kinds of 
reasoning: [Wo lff 96 . 

• Best match information retrieval: |Wolff 99llWolff 94a| . 

• The design and execution of software: Wolff 94b . 

The papers before 1996 were written at a stage before the 'multiple alignment' insights 
were achieved (see Section |3 below). 
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• The potential range of applications of the SP concepts: [Wolff 95al 
I Wolff 95 b| . 

• Causality: [Wolff 97| . 

• Parsing of natural language and representing the structure of natural lan- 
guage in the ICMAUS framework [Wolff 98dll IWolff 98d2l IWolff 98d3| . 

• Probabilistic reasoning in the ICMAUS framework [Wolff 991 IWolff 98al 
IWolft 98birWoltt 98c| . including: 

— Representing class hierarchies and inheritance of attributes. 

— Best-match pattern recognition and information retrieval. 

— Probabilistic 'deductive' reasoning. 

— Probabilistic abductive reasoning. 

— Reasoning with probabilistic decision networks and decision trees. 

— Reasoning with probabilistic if-then rules. 

— Reasoning with default values and nonmonotonic reasoning. 

— Solving geometric analogy problems. 

— Modelling the functioning of a Bayesian network. 

The primary purpose of this paper is to describe the SP theory in outline, 
to show how it may provide an interpretive framework for the UTM and PCS, 
and to describe how those models may be augmented for greater generic func- 
tionality. 

Apart from Appendix^ the presentation is relatively informal with exam- 
ples to make the ideas clear. No attempt has been made to present a formal 
proof that the ICMAUS framework may model any UTM/PCS, largely because 
a detailed proof would take far more space than is available here. 

1.3 Presentation 

In what follows: 

• Section [21 describes, in outline, the SP theory as it has been developed 
to date, together with some brief remarks about working models of the 
system. 

• Section [31 describes, briefly, the concept of a UTM and, more fully, the 
equivalent concept of a PCS. 

• Section 21 tries to show how the operation of a PCS (and, consequently, 
the operation of a UTM) may interpreted in terms of ICMAUS concepts. 

• Section [5] considers how established models of computing may, with ad- 
vantage, be augmented. 

• Section El contains concluding remarks. 

• Appendix lAl defines the key terms used in this article. 
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2 The SP theory 



As already indicated, the central idea in the 'SP' theory is the conjecture that: 

All kinds of computing and formal reasoning may usefully be under- 
stood as information compression by multiple alignment, 2 unification 
and search. 

Information compression (IC) may be interpreted as a process of maximising 
Simplicity in information (by reducing redundancy) whilst retaining as much as 
possible of its non-redundant descriptive Power. Hence the mnemonic 'SP' 
which has been applied to these ideas. 

In this programme of research: 

• The term multiple alignment (MA) has a meaning which is related to but 
distinct from its meaning in bio-informatics (where the term is normally 
used) . The meaning of the term in this programme of research is explained 
below (Section EU). 

• The term unification means a simple merging of matching symbols and 
sequences of symbols, a meaning which is related to but simpler than the 
concept of 'unification' in logic (where it means the assignment of values 
to variables in two structures so that the two structures become the same). 

As explained more fully below (Section 12.3.31) . the unification of match- 
ing patterns 3 is the core of the mechanism by which information is com- 
pressed. Conjecturally, it is the sole mechanism by which information 
may be compressed - meaning that all other mechanisms may be seen to 
contain unification at a fundamental level. More research is needed to 
establish whether or not this conjecture is true. 

• In this context, the term search means the systematic exploration of the 
abstract space of possible multiple alignments. This can mean exhaustive 
search of the entire space of possible alignments but, as amplified below, 
for most practical problems, there must be some kind of constraint which 
reduces the size of the abstract space being searched. 

• As we shall see (Section l3.3f) . the conjecture about the nature of com- 
puting, above, depends on at least one key assumption about how IC 
is measured. It also needs to be qualified by the observation that, to be 

2 In earlier publications, the conjecture was expressed with the words 'pattern matching' 
in place of the words 'multiple alignment'. However, in the last few years, it has seemed 
appropriate to focus more narrowly on multiple alignment (which is one form of pattern 
matching) as the most promising framework (with unification and search) for the intended 
integration of concepts. 

3 The meaning of the term pattern in this research is defined in Appendix^] It includes the 
meaning of the term sequence but it is more general because it includes arrays of symbols in 
two or more dimensions and it includes sequences (and other arrays) which are discontinuous. 
However, in most contexts in this article, the term pattern may be read as synonymous with 
the term sequence. 
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practical, the process of searching for MAs that yield 'good' IC, it is some- 
times necessary to examine MAs yielding ICs which are zero or negative 
(see Section 

2.1 Minimum Length Encoding and unsupervised induc- 
tive learning 

The SP theory incorporates one version of the principle of 'Minimum Length 
Encoding' (MLE) 4 from the field of inductive inference (see |5olomono ff 641 
Wall ace and Boulton 681 IRissanen 781 |Li and Vitanyi 97| ). MLE is itself a ver- 
sion of 'Occam's Razor'. 

The key idea in MLE is that, in measuring compression, we must 
measure the size of the raw data in its encoded form together with 
the size of the code patterns which are needed to create the encoding. 
Of course, if one or other of these components is constant (as is the case with 
all the examples considered in this article) then, for purposes of comparison, it 
is only necessary to measure the one which varies. 

If both components can vary then, if cither of them is omitted, counter- 
intuitive anomalies can arise. For example: 

• We may create a code pattern comprising the entire contents of the works 
of Shakespeare with a one-bit label. Given that code pattern, we can 
compress the complete works of Shakespeare into one bit. 

• Alternatively, we can create a set of code patterns comprising the letters 
and other characters used to print the text of Shakespeare. In this case, the 
set of code patterns is very small but encoding the works of Shakespeare 
using these patterns gives a result which is the same size as the original! 

Clearly, any realistic compression of the Shakespeare text falls somewhere 
between these two extremes. Minimising the overall size of the set of code 
patterns and the encoded text seems to produce the required balance. 

In connection with the discovery and development of these ideas, 
ISolomono ff 97 writes: 

"/ was trying to find an algorithm for the discovery of the 'best' 
grammar for a given set of acceptable sentences. One of the things I 
sought was: Given a set of positive cases of acceptable sentences and 
several grammars, any of which is able to generate all the sentences, 
what goodness of fit criterion should be used? It is clear that the 'ad 
hoc grammar', that lists all of the sentences in the corpus, fits per- 
fectly. The 'promiscuous grammar' that accepts any conceivable sen- 
tence, also fits perfectly. The first grammar has a long description; 
the second has a short description. It seemed that some grammar 

4 MLE is an umbrella term for concepts of 'Minimum Message Length' encoding (MML) 
and the closely related 'Minimum Description Length' encoding (MDL). 
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half-way between these, was 'correct' - but what criterion should be 
used?' 

As indicated above, the key seems to be to minimise the overall size of the 
'grammar' (repository of code patterns) together with the text which is encoded 
in terms of the grammar. 

Similar ideas have been developed in computer models designed to test 
whether or not 'statistical' mechanisms might provide an explanation of some 
aspects of what children can apparently achieve when they learn their first lan- 
guage (see Wolff 88 , Wolff 82] and earlier papers referenced there) : 

• There is evidence that children can learn the segmental structure of lan- 
guage (words, phrases, sentences) without needing explicit markers at the 
beginnings and ends of these structures or explicit teaching. 

• In a similar way, they appear to be able to 'bootstrap' their knowledge of 
word classes and other disjunctive categories in language without the aid 
of a teacher or other assistance. 

• Apparently without the need for external supervision or correction, chil- 
dren can learn to distinguish 'correct' generalisations of grammatical rules 
from 'incorrect' overgeneralisations despite the fact that, by definition, 
both kinds of generalisation have zero frequency in the child's experience. 

The models that have been developed (described in the sources cited above) 
appear to provide solutions to these problems and, as a by-product of the way 
they work and without ad hoc provision, exhibit several features in their learn- 
ing which correspond remarkably well with a range of phenomena observed 
when children learn their first language. This empirical evidence is reviewed in 
|Wolff 88| . 

2.2 Multiple alignment 

The term multiple alignment is normally associated with the computational 
analysis of (symbolic representations of) sequences of DNA bases or sequences of 
amino acid residues as part of the process of elucidating the structure, functions 
or evolution of the corresponding molecules. 

The aim of the computation is to find one or more alignments which are, in 
some sense, 'good'. What 'good' can mean is discussed briefly below (Section 

An alignment is an arrangement of two or more sequences of atomic symbols 
so that, by judicious 'stretching' of sequences where appropriate, symbols that 
match each other are arranged in vertical columns. 

As a matter of presentation, it is sometimes convenient to allow some 
columns to contain symbols that do not match each other and in this case ver- 
tical lines between symbols are used to mark where there are positive matches 
(hits) between the symbols. An example of an alignment of DNA sequences 
which uses this second convention is shown in Figure ^ 
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Figure 1: A 'good' alignment amongst five DNA sequences. 



2.2.1 Generalisation of the concept of multiple alignment 

In this research, the concept of MA has been generalised in the following way: 

1. One (or more) of the sequences of symbols to be aligned has a special 
status and is designated as 'New'. 

2. All other sequences are designated as 'Old'. 

3. Alignments are evaluated in terms of the compression of New that can be 
achieved by encoding New in terms of sequences in Old, with or without 
the hierarchical encoding of tags mentioned in Section 12.3.11 below. A 
detailed description of how this can be done is presented in Wolff 98a 
Wolff 98d2 with a somewhat less detailed description in Wolff 99 . 

4. An implication of this way of framing the alignment problem is that, 
by contrast with 'multiple alignment' as normally understood in bio- 
informatics, any given sequence in Old may appear two or more times 
in any one alignment and may therefore be aligned with itself (with the 
obvious restriction that any one instance of a symbol may not be aligned 
with itself). 

Regarding the last point, the kind of MA shown in Figure can obviously 
include two or more copies of a given sequence in any one alignment. It is 
unlikely that anyone would bother to do this in practice because it would simply 
lead to the trivial alignment of each symbol in one copy with the corresponding 
symbol or symbols in the one or more other copies. 

What is proposed for the ICMAUS framework is different: any one sequence 
may appear two or more times in an alignment. Each appearance is just that, 
it is an appearance of one sequence, it is not a duplicate copy of a pattern. 

Since each appearance of a pattern represents the same pattern, it makes 
no sense to recognise a hit between a symbol from one appearance and the 
corresponding symbol in another appearance - because this is simply matching 
one instance of the symbol with itself. Any such match is spurious and must be 
forbidden. However, it is entirely possible to recognise a hit between a symbol 
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S N #N V #V #S (1000) 
N j o h n #N (300) 
Nlsusan#N (700) 
V0walks#V (650) 
V 1 r u n s #V (350) 



Figure 2: A simple grammar written as patterns of symbols. For each pattern, there 
is a number in brackets on the right representing the frequency of occurrence of the 
pattern in parsing of a notional sample of the language. 
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Figure 3: Parsing of the sentence 'j o h n r u n s' as an alignment amongst sequences 
representing the sentence and relevant rules in the grammar in Figure [5] 



from one pattern and a different symbol from the same pattern. Examples will 
be seen later. 

2.2.2 A simple example 

To give the reader a flavour of the concept of multiple alignment as it has 
been developed in this research, a very simple example (from Wolff 98dl ) is 
presented here showing how parsing may be understood in these terms. 

Figure [5] shows a 'grammar' for a fragment of English syntax presented as 
patterns as defined in Section Al. This particular example models a context-free 
phrase-structure grammar (CF-PSG) except that there is no rewrite arrow and 
each pattern has a number (shown in brackets on the right) which is its notional 
frequency of occurrence in a parsing of a sample of English. 5 

Figure [3] shows an MA which may be interpreted as a parsing of the sentence 
'j o h n r u n s' in terms of the grammar in Figure |3 In parsings like this, the 
sentence to be parsed plays the role of New and the grammar used for parsing 
is treated as Old. By convention in these kinds of MA, the pattern which is 
New occupies the first row of the alignment and relevant patterns from Old are 
shown in the second and subsequent rows, one pattern per row. 

5 Despite the similarity, in this example, with a CF-PSG, this mode of representing syntax 
appears to have the kind of context-sensitive expressive power which is needed to represent the 
syntax of natural languages and which is missing from CF-PSGs (see [Wolff 98dl Wolf f~98d2l 
IWnUfBHdap. 
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2.3 Evaluation of alignments in terms of information com- 
pression 

Intuitively, a good alignment is one which has many hits (positive matches 
between symbols), few gaps (sequences of one or more symbols which are not 
part of any hit) and, where there are gaps, they should be as short as possible. It 
is possible to use measures like these directly in computer programs for finding 
good MAs and, indeed, they commonly are. However, our confidence in the 
validity of measures like these may be increased if they can be placed within a 
broader theoretical framework. 

In keeping with the remarks about IC, above, it seems that IC, and related 
concepts of probability, 6 may provide that framework. Work on the evaluation of 
MAs in this tradition includes Rci cherte£ al. 73l Felsens tein 81llChan et al. 921 
lAllison and Wallace 94llAllison et al. 92IIWolff 94a| . 

In the present proposals, MAs are also evaluated in terms of IC. However, 
owing to the way in which the concept of multiple alignment has been generalised 
(Section l2.2.1l above), the way in which alignments are evaluated in terms of IC, 
described in outline next, is somewhat different from methods used elsewhere. 

2.3.1 Encoding New in terms of codes for patterns in Old 

There is insufficient space to explain in detail how alignments are evaluated in 
terms of IC in the present scheme. Detailed explanation and examples showing 
how this can be done with multiple alignments can be found in Wolff 98d2 
IWolff 98aj with somewhat less detailed accounts in [Wolff 98dll IWolff 99| . This 
subsection describes the method in outline. 

In Figure □ 'j o h n' in New is aligned with 'N j o h n $N' in Old. It 
should be intuitively clear that we can encode this part of New using the 'code' 
symbols 'N #N'. Likewise 'runs' may be encoded as 'V 1 #V. Thus the 
whole sentence may be encoded as 'N #N V 1 #V with a modest reduction 
from 8 symbols to 6 (assuming, for the sake of simplicity in this example, that 
all symbols are encoded with the same number of bits). 

However, the pattern 'N #N V 1 #V contains the subsequence 'N #N V 
#V. And this subsequence is contained within the pattern 'S N #N V #V #S' 
- which is one of the patterns in Old. This means that the subsequence 'N #N 
V #V may be encoded as 'S #S'. Thus the sequence 'N #N V 1 #V may 
be reduced to 'S 1 #S', preserving the symbols '0' and T' to encode 'j o h n' 
and 'runs' respectively. 

This encoding at two 'levels' (the sentence level and the word level) has the 
effect of reducing the original 8 symbols to 4. It should be clear that, with more 
complex examples, the same principles may be applied recursively through any 
number of 'levels'. 

e Variations in the probability of symbols and patterns has a key role in compression meth- 
ods such as Huffman coding or Shannon- Fano-Elias coding (sec Cover and Thomas 91 ). Con- 
versely, measures of information content after compression can be used to calculate probabil- 
ities (see LSolomonoff"97l IWolff 99l IWolff 98al IWolff 98b1 IWolff 98cl l. 
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Regarding the MLE principle (Section I2.1f> that measures of compression 
should include both the size of the 'grammar' used for coding and the size of 
the data after encoding, this is true in the context of systems that learn - where 
the size of the grammar is changing. As was noted previously, this need not 
apply in examples like the ones considered in this article, where, in each example, 
the size of Old remains constant. In these cases, for purposes of comparison, we 
may ignore the size of Old. 

2.3.2 Taking account of the probabilities of patterns and symbols 

To present the method simply, this outline has assumed that every symbol 
is encoded with the same number of bits as every other symbol. However, 
the method used in the SP61 model (see Section 12.61 below) calculates the 
encoding cost of each symbol individually, taking account of its frequency of 
occurrence using a formula based on the Shannon-Fano-Elias (S-F-E) method 
(See |Cover and ThomasliT] '). Of course, encoding costs calculated in this way 
are totally unrelated to the numbers of characters used in the names of symbols. 
Names of symbols are used purely for the sake of human comprehension and 
readability. 

The sizes of codes assigned to patterns may, in a similar way, be related to 
their frequencies of occurrence using the S-F-E method or the Huffman method 
or something similar. However, for reasons which would take too much space to 
explain here, the SP61 model determines the size of the code for each pattern 
from the sizes of the symbols which are used in the code for the given pattern. 
This yields code sizes which are roughly in line with the principle that rare 
patterns should have longer codes than common patterns but the result is not 
as precise as with Huffman and similar methods. 

2.3.3 The role of unification in information compression 

It may seem too natural and 'obvious' to command much attention but the 
essence of the compression just described is the merging or unification of two or 
more sequences of symbols to make one. At the lowest level, 'j o h n' from New 
is, in effect, unified with 'N j o h n #N' from Old. At the next level, 'N #N 
V #V from two of the patterns in Old is unified with 'S N #N V #V #S', also 
from Old. 

It is this unification of matching patterns, with or without the use of 'codes' 
or 'tags' to show where information has been deleted, which provides the foun- 
dation of all the simpler 'standard' methods for IC (see [Storer 88] ) and, by 
conjecture, all other methods as well. 

In the ICMAUS scheme, the touchstone of whether the necessary unifications 
can be applied is whether or not an alignment can be 'projected' into a single 
sequence without any ambiguity about the left-to-right sequence of the symbols. 
In the example shown in Figure the alignment can be projected into a single 
sequence, thus: 

SNOj ohn#NVlruns#V#S. 
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In this projection, each symbol corresponds to a column in the alignment 
and, in each column containing two or more symbols, the symbols have been 
unified to make one. Every column has a place in the alignment which ensures 
that its left-right position relative to other columns is not ambiguous. 

2.3.4 'Mismatches' in alignments 

By contrast with the example just given, the two 'nonsense' alignments shown 
here: 

a b a b 

II II 

a x b and a b x 

II II 

a y b a b y 

exhibit ambiguity about the relative left-to-right positions of 'x' and 'y'. This 
means that neither of the alignments can be projected into a single sequence 
without making arbitrary choices, not supported by the data, about whether 
'x' should precede 'y' or vice versa. 

This kind of ambiguity in an alignment is described, in this research, as a 
'mismatch' of symbols. In the ICMAUS scheme as it has been developed to date 
(and the SP61 model in which the scheme has been expressed) any alignment 
containing one or more mismatches between symbols in Old is discarded. It 
is possible, of course, to devise some scheme for resolving this kind of conflict 
using some kind of measure of 'strength' or 'support' for each of the two alter- 
natives. At present, however, the theory in its current form does not provide 
any motive for this kind of manoevre and there seems not to be any need for 
it. Thus, pending new insights which might dictate a different course of action, 
any alignment of this kind is simply discarded. 

In the ICMAUS scheme as currently conceived and also in the SP61 model, 
mismatches between symbols in New and symbols in Old are taken to mean sim- 
ply that the offending part of New cannot be encoded in terms of the given align- 
ment. Alignments with this kind of mismatch are permitted in the ICMAUS 
scheme and seem to be necessary in most parsing applications so that larger 
alignments encoding relatively large parts of New can be built from smaller 
ones which encode relatively small sections of New. 

2.3.5 Information compression and Algorithmic Information Theory 

In Algorithmic Information Theory (see, for example |Li and Vitanyi 97| ), a 
body of information is deemed to be random if no computer program can be 
found to generate the information which is smaller than that body of informa- 
tion. This elegant approach to defining randomness (which has advantages over 
earlier concepts of randomness) suggests that information compression should 
be defined in terms of relative sizes of computer programs. 
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The types of encodings described above may not look much like computer 
programs but there are remarkably close parallels between the devices used in 
structuring computer programs and the devices used for information compres- 
sion. It would take too much space to discuss this here but readers may find a 
relatively full discussion in Wolff 93 . 

2.4 Search 

It is generally understood that, in multiple alignment problems, the abstract 
space of possible alignments (and corresponding unifications) is normally too 
large to be searched exhaustively. For example, the size of the search space for 
alignments of one sequence (A) of length m with another sequence (B) of length 
n is: 

m n 
t=l 3=1 

where is the number of subsequences of length i within A and bj is the number 
of subsequences of length j in B, and a, is calculated as: 

ml 

1 i\(m — i)l 

and likewise for bj, substituting n for m. Even in simple cases like these, the 
size of the search space grows exponentially with the size of the patterns. In 
general, search spaces are astronomically large except when patterns are very 
few and very short. 

2.4.1 Constraints on search 

Because the entire search space for an MA is normally so very large, it is nec- 
essary for practical purposes to constrain the search in some way so that only 
a part of the search space is examined. In broad terms, this can be done in two 
different ways (which may be applied singly or together): 

• More or less arbitrary constraints may be applied. For example, a limit 
may be placed on the maximum size of any gap (sequence of unmatched 
symbols) in any alignment. All other alignments are excluded from con- 
sideration. With this kind of constraint, certain parts of the search space 
are ruled out a priori and can never be reached. 

• The second kind of constraint is applied in 'heuristic' techniques like 'hill 
climbing' ('descent'), 'beam search', 'genetic algorithm', 'simulated an- 
nealing' and so on. These techniques use some kind of measure of 'good- 
ness' to narrow the space which is to be searched in successive stages. 
For this reason, they are sometimes known generically as 'metrics-guided 
search'. With heuristic techniques, the space which is to be searched can 
be dramatically reduced but no part of the total search space is ruled out 
a priori. 



13 



With constraints like these, there is normally a trade-off between the 'qual- 
ity' of the best solutions found and the thoroughness of the search. With these 
kinds of technique, ideal solutions can be guaranteed only when examples are 
very small. However, with examples of realistic size, narrowing the search space 
as described above can yield good approximate solutions without excessive com- 
putational demands. For many practical applications, these approximate solu- 
tions are quite acceptable. 

2.4.2 Escaping from 'local peaks' 

As just noted, a recognised feature of heuristic search techniques like hill climb- 
ing is that the search process may reach solutions to the search problem which 
are relatively good but are not the best that may be found. With all but the 
very smallest examples, it is not possible to prove that a given solution is the 
best possible solution. With every solution, there is normally the possibility 
that a better solution may exist elsewhere in the search space. 

If, for a given solution, we wish to discover whether a better one may be 
found elsewhere, it is necessary for the search process to be able to move 'down 
hill' from the local peak, in the hope that, after some backtracking, a route 
may be found to something which is 'higher' than the local peak. Alternatively, 
search may proceed along two or more parallel or pseudoparallel paths and 
should be allowed to continue even when one or more peaks have been found - 
to allow for the possibility that paths which by-pass the peaks which have been 
found may lead to other peaks which are even higher. 

In the present context, the need to be able to escape from local peaks means 
that the search process needs to be able to create alignments yielding relatively 
poor compression of New in terms of Old. It has been found in practice with 
the SP61 and earlier models (see Section I21fl below) that, notwithstanding the 
conjecture on which this research is based (presented at the beginning of Section 
0), it can be advantageous on occasions for the system to create alignments in 
which IC is very small or even negative. 

2.5 'New' and 'Old' and established concepts in comput- 
ing 

In case concepts like 'New' and 'Old' seem too far removed from computing as 
normally understood, it may be useful at this stage to indicate the direction in 
which these proposals are going. To this end, Table ^ shows how, in the current 
proposals, concepts of New and Old may be mapped on to established concepts 
in computing. 

2.5.1 Unsupervised inductive learning 

In keeping with remarks at the beginning of Section [2 about the origins of the 
MLE principle as a key to understanding unsupervised inductive learning, this 



14 



Area of 
application 

Unsupervised 
inductive learning 

Parsing 

Pattern 
recognition 
and scene 
analysis 



New 

'Raw' data. 



The sentence 
to be parsed. 

A pattern to be 
recognised 
or scene to be 
analysed. 



Old 



Grammar or other 
knowledge structure 
created by learning. 

The grammar used for 
parsing. 

The stored knowledge 
used to recognise 
one pattern or several 
within a scene. 



Databases 



A 'query' in SQL or 
other query language. 



Records stored 
in the database. 



Expert 
system 



A 'query' in the 
query language for 
the expert system. 



The 'rules' or other 
knowledge stored in 
the expert system. 



Computer The 'data' or 

program 'parameters' 

supplied to the 

program on each run. 

Table 1: The way in which the concepts of 'New' and 'Old' in this research appear to 
relate to established concepts in computing. 



The computer program 
itself. 
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kind of learning - shown first in Table - provides the overall framework for 
these ideas. 

It is envisaged that compression of any reasonably large body of information 
will normally be done in an incremental manner, processing one section at a 
time: 

• In keeping with the use of heuristic techniques to accommodate very large 
search spaces, incremental processing means that, at any one time, the 
space of possible alignments can be significantly smaller than if the body 
of information is processed all at once. 

• This incremental approach to compression is convenient also when New 
information is supplied in an incremental manner as it often is. In this 
case, processing can start immediately without the need to wait until all 
the information has been received. 

In this general framework, it is envisaged that, initially, Old is empty. The 
first section of New is received or selected and immediately added to Old. It 
is then processed and compressed as far as possible in terms of patterns in 
Old. At this early stage, the only compression that can be achieved is due to 
redundancies that may be present within the first section from New. 

In all subsequent stages, processing is done in the same way: each new section 
of New is added to Old, a set of multiple alignments are formed between New 
and Old and the best one is selected as a basis for encoding and compressing 
New in terms of Old. As successive sections of New are processed, the range 
of patterns in Old will gradually increase so that, generally, it will become 
progressively easier to find good matches and useful unifications. 

At every stage, the system attempts to encode New information using 'codes' 
or 'tags' which are attached to patterns in Old. If suitable tags are not present, 
the system creates new ones and attaches them as required to patterns or parts 
of patterns in Old. 

2.5.2 Other aspects of computing 

It is envisaged that each of the other manifestations of computing which are 
shown in Tabled an d others, may be understood as one part of the inductive 
learning process described in the last subsection. Each of these kinds of com- 
puting may be seen as being analogous to the processing of one section of New 
so that, via the formation of MAs, it is compressed in terms of existing patterns 
in Old using existing codes. The example in Section 12 . 2 . 21 illustrates this idea. 

2.6 A working model 

With the exclusion of the processes for laying down new patterns in Old, mod- 
ifying existing patterns in Old and attaching new codes in appropriate places, 
the concepts which have been outlined above are embodied in a software model, 
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SP61, developed as a test-bed for these ideas and as a vehicle for demonstra- 
tions. It is envisaged that the next phase in the development of the model will 
give it a capability for unsupervised inductive learning: the ability to lay down 
new patterns in Old, modify existing patterns in Old and assign new codes in 
appropriate places. 

A description of the model in its current form is presented in Wolff 98a 
with a somewhat less detailed description in Wolff 99 . A fuller description of 
a similar but slightly earlier version of the model may be found in Wolff 98d2j 
with a briefer description in [Wolff 98dl . Examples from these and earlier 
versions may be found in the sources cited in Section IT21 All the examples of 
alignments which are shown later in this article are output from SP61, edited 
as necessary to fit each alignment onto the page. 

When the concepts and the model are more mature, it is intended that the 
model will form the basis for the development of an SP system for practical 
applications. 



2.7 Related ideas 

Since the primary goal in the development of the ICMAUS framework has been 
to integrate concepts across a wide range of subfields of computing, it is not 
altogether surprising that the framework has a family resemblance to other 
systems. The ability of the ICMAUS framework to model a range of kinds of 
computing systems was detailed in Section ll. 21 

This section notes of some systems with points of similarity with the IC- 
MAUS scheme: 

• The GAMMA parallel programming model 
|Murthy and Krishnamurthy 95| is based on a chemical reaction model 
in which the datastructure is a bag or multiset. The computations are 
interpreted as a succession of chemical reactions consuming the elements 
of the multiset and producing new elements according to specific rules. 
Computations are achieved via two basic mechanisms: searching for 
elements of the multiset that satisfy the reaction conditions and, for each 
element found, the execution of relevant actions. Any number of reactions 
may be performed in parallel provided they operate on disjoints sets of 
elements. Where the reaction conditions select two or more elements, the 
choice of elements may be made probabilistically. 

• A 'classifier system' [Forrest 91] comprises a set of production rules oper- 
ating in a high-parallel environment, a 'bucket brigade' learning algorithm 
for adjusting weights on rules and a genetic algorithm for the learning of 
new rules. 

• More generally, rule-based programming is a simple scheme 
with a wide range of applications (See, for example, 
|Krishnamurthy and Krishnamurthy 99| ). Computations are achieved by 
the interaction of relatively simple elements (rules) in an 'object space' 
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that produce new elements or the same elements with modified attributes. 
Within disjoint subsets of the object space, interactions may proceed in 
parallel. Pattern matching is the basis of how elements are selected. As 
with the GAMMA model, rule-based programming may be adapted to 
operate in a non-deterministic, probabilistic manner. 

• The TREAT algorithm Mirankcr 90| is a matching algorithm designed for 
efficient pattern matching in a production rule system, running on a spe- 
cific high-parallel architecture ('DADO'). As with other systems described 
above, pattern matching and a high-parallel environment are prominent 
features of the system. 

The ICMAUS framework resembles these systems in the use of many rela- 
tively simple standardised entities as the basis of knowledge representation and 
in the emphasis on pattern matching in the way knowledge is processed. The 
high-parallel environment finds an echo in the high-parallel form of the pro- 
jected 'SP' computing system, based on the ICMAUS abstract framework. The 
probabilistic dimension of the ICMAUS scheme is another feature shared with 
schemes like those just mentioned. 

Perhaps the most important difference between these systems and the IC- 
MAUS framework is in the intended scope of the system. By conjecture, the 
scope of the ICMAUS framework is as broad as computing itself: nothing is 
taken for granted including such seamingly basic notions as 'variable', the num- 
ber system, negation, comparisons like 'greater than' or 'less than', and so on. 
The research aims, amongst other things, to provide an interpretation of con- 
cepts like these. Also distinctive of the ICMAUS framework is the emphasis 
on IC as a unifying theme and, by conjecture, unification of matching patterns 
as the sole mechanism for achieving IC. Other differences are that 'patterns' in 
the ICMAUS scheme are even simpler than production rules (patterns lack a 
rewrite arrow and are without any distinction between 'condition' and 'action') 
and that multiple alignment is prominent in the ICMAUS framework in a way 
that it is not in other systems. More controversially, perhaps, the probabilistic 
dimension of the ICMAUS scheme [Wolff 99llWolff 98al IWolff 98bl IWolff 98c| is 
intrinsic to the basic conceptual framework in a more fundamental way, perhaps, 
than it is in other schemes. 

2.7.1 DNA computing 

[Adfeman 94j demonstrated that simple versions of the Hamiltonian path prob- 
lem could be solved using DNA in solution (see also [Adleman 98| 1. Given the 
prominence of multiple alignment in the ICMAUS framework and given that 
MA is often associated with the analysis of DNA molecules, one might natu- 
rally assume a relationship between DNA computing and the ICMAUS scheme. 

The main significance of DNA computing in relation to the ICMAUS con- 
cepts is probably that chemical reactions with DNA (and other molecules) pro- 
vide a means of matching patterns and this may be done with massive par- 
allelism. Thus potentially, this mode of computing provides a vehicle for the 
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pattern matching required in the ICMAUS scheme and the high levels of paral- 
lelism which may help to achieve useful speeds. 

In Adleman's first experiments the pattern matching was an all-or-nothing 
kind, not in itself appropriate for the kind of partial matching required 
for ICMAUS in its more fully developed forms. However, later work 
|Baum and Boney 96| has shown that pattern matching with DNA can sup- 
port the more sophisticated kinds of heuristic processing needed to find good 
partial matches between patterns. 

3 Universal Turing Machine and Post Canonical 
System 

In its standard form |Turing 36- f| , a UTM comprises a 'tape' on which symbols 
may appear in sequence, a 'read- write head' and a 'transition function'. In each 
of a sequence of steps, a symbol is read from the tape at the current position of 
the head and then, depending on that symbol, on the transition function and 
the current 'state' of the head, a new symbol is written on the tape, the state 
of the head is changed, and the head moves one position left or right. 

A more detailed description of UTMs is not needed here because the argu- 
ments to be presented take advantage of the fact |Minsky 67\ chapters 10 to 14] 
that the operation of any UTM can be modelled with a PCS [Post 43| . In this 
section, I shall describe the workings of a PCS and then, in Section 0] I shall 
try to show how the operation of a PCS, and thus a UTM, may be understood 
in terms of ICMAUS. 

3.1 The structure of a PCS 

A PCS comprises: 

• An alphabet of primitive symbols ('letters' in Post's terminology), 

• One or more primitive assertions or axioms (strings of symbols which 
can often be regarded as 'input'). 

• One or more productions which can often be regarded as a 'program'. 

• Although this is not normally included in formal definitions of the PCS 
concept, it is clear from how the system is intended to work (see Section lT^l 
below), that the PCS also includes mechanisms for matching patterns 
and for searching for matches between leading substrings in the input 
pattern and the left-hand ends of the productions. 

Each production has this general form: 7 

7 Spaces between symbols here and in other examples have been inserted for the sake of 
readability and because it allows us to use atomic symbols where each one comprises a string 
of two or more characters (with spaces showing the start and finish of each string). Otherwise, 
spaces may be ignored. 
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go $1 9l $2 •■• $n 9n ~ * ^0 $1 ^-1 $2 ••• $m 

where "Each gi and hi is a certain fixed string; go and g ra are often null, and 
some of the h's can be null. Each $^ is an 'arbitrary' or 'variable' string, which 
can be null. Each $' ; is to be replaced by a certain one of the $;." |Minsky 67[ 
pp. 230-231]. 

In its simplest 'normal' form, a PCS has one primitive assertion and each 
production has the form: 

g $ -►$ h 

where g and h each represent a string of zero or more symbols, and both in- 
stances of '$' represent a single 'variable' which may have a 'value' comprising 
a string of zero or more symbols. 

It has been proved [Post 43| that any kind of PCS can be reduced to a PCS 
in normal form (see also |Minsky 67| chapter 13]). That being so, this kind of 
PCS will be the main (but not exclusive) focus of our attention. 

3.2 How the PCS works 

When a PCS (in normal form) processes an 'input' string, the first step is to find 
a match between that string and the left-hand side of one of the productions in 
the given set. The input string matches the left hand side of a production if a 
match can be found between leading symbols of the input string and the fixed 
string (if any) at the start of that left-hand side, with the assignment of any 
trailing substring within the input string to the variable within the left-hand 
side of the production. 

Consider, for example, a PCS comprising the alphabet 'a ... z', an axiom or 
input string 'a b c b t', and productions in normal form like this: 

a $ — » $ a 
b $ -> $ b 
c $ -> $ c 

In this example, the first symbol of the input string matches the first symbol 
in the first production, while the trailing 'b c b t' is considered to match the 
variable which takes that string as its value. The result of a successful match like 
this is that a new string is created in accordance with the configuration on the 
right hand side of the production which has been matched. In the example, the 
new string would have the form 'bebt a', derived from 'b c b t' in the variable 
and 'a' which follows the variable on the right hand side of the production. 

After the first step, the new string is treated as new input which is processed 
in exactly the same way as before. In this example, the first symbol of 'b c d 
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t a' matches the first symbol of the second production, the variable in that 
production takes 'c d t a' as its value and the result is the string 'c b t a b'. 

This cycle is repeated until matching fails. It should be clear from this 
example that the effect would be to 'rotate' the original string until it has the 
form 'tabc b'. The 't' which was at the end of the string when processing 
started has been brought round to the front of the string. This is an example 
of the "rotation trick" used by |Minsky 67| Chapter 13] in demonstrating how 
a PCS in normal form can model any kind of PCS. 

With some combinations of alphabet, input and productions, the process of 
matching strings to productions never terminates. With some combinations of 
alphabet, input and productions, the system may follow two or more 'paths' 
to two or more different 'conclusions' or may reach a given conclusion by two 
or more different routes. The 'output' of the computation is the set of strings 
created as the process proceeds. 

3.3 Other examples 

For readers unfamiliar with this kind of system, two other examples are included 
here to show more clearly how a PCS may achieve 'computing'. The first exam- 
ple is based on two examples from |Minsky 67| Chapter 12] showing how odd 
and even unary numbers may be generated. The second example is from the 
same source. Other examples may be found in the same chapter. 

3.3.1 Generating numbers in unary notation 

In the unary number system, 1 = '1', 2 = '1 1', 3 = '1 1 1' and so on. The 
recursive nature of the unary number system can be expressed very clearly and 
succinctly with a PCS like this: 

• Alphabet: the single symbol 1. 

• Axiom: 1. 

• Production: If any string '$' is a number, then so is the string '$ 1'. This 
can be expressed with the production: 

$ -> $ 1 

It should be clear from the description of how the PCS works, above, that 
this PCS will generate the infinite series of unary strings: '1', '1 1', '1 1 1', '1 1 
1 1', T 1 1 1 1' etc. 

Less obvious, perhaps, is that the same PCS can be used to recognise a 
string of symbols as being an example of a unary number. This is done by using 
the production in 'reverse', matching a character string to the right hand side 
of the production, taking the left hand side as the 'output' and then repeating 
the right-to-left process until only the axiom will match. 
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3.3.2 Generating palindromes 



To quote |Minsky Q7\ p. 228]: "The 'palindromes' are the strings that read the 
same backwards and forwards, like cabac or abcbbcba. Clearly, if we have a string 
that is already a palindrome, it will remain so if we add the same letter to the 
beginning and end. Also, clearly, we can obtain all palindromes by building in 
this way out from the middle." Here is his example: 

• Alphabet: a, b, c. 

• Axioms: a, b, c, aa, bb, cc. 

• Productions: 

$ — > a $ a 
$ -> b $ b 
$ -> c $ b 

Although this example is not in normal form, it should be clear how it 
generates all the palindromes for the given alphabet. As with the previous 
example, this PCS may be used to recognise whether or not a string of characters 
is a palindrome by using the productions in 'reverse'. 

4 ICMAUS and the operation of a PCS 

It is not hard to see that a successful match between a leading substring in an 
input string and the leading substring in a PCS production may be interpreted 
in terms of the alignment and unification of patterns, as outlined in Section 
[21 What is less obvious is how concepts of alignment and unification might be 
applied to the processes of matching a trailing string of symbols to the variable, 
assigning it as a value to the variable and incorporating it in an output string. 

4.1 Modelling one step of the "rotation trick" 

In the proposed SP framework (ICMAUS), the first PCS described above (for 
the 'rotation' of a string) may be modelled using the patterns shown in Figure 
0] Regarding these patterns: 

• The pattern under the heading 'New' corresponds to the axiom or 'input'. 

• The four patterns on the left under the heading 'Old' may be seen as a 
representation of the alphabet. 

• The first three patterns on the right correspond to the three productions 
of the PCS. As we shall see, the pair of symbols, '$ #$', in each pattern 
function as if they were a 'variable'. There is no rewrite arrow ('—>') and 
only one instance of the 'variable'. 
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New 



a b c b t 



Old 



L a #L 

L b #L 

L c #L 

L t #L 



p a $ #$ a #P 
P b $ #$ b #P 
P c $ #$ c #P 
$ L #L $ #$ #$ 



Figure 4: SP patterns modelling the first example of a PCS described in Section fH.il 
An interpretation of these patterns is given in the text. 

• As we shall see, the last pattern on the right ('$ L #L $ =#=$ #$') is, in 
effect, a recursive definition of the concept of 'any string of the letters in 
the given alphabet'. This is comparable with the production shown in the 
example in Section f3 . 3 . 1 1 which expresses the recursive nature of the unary 
number system. It is, in effect, a 'type definition' for the variable defining 
the infinite set of alternative strings (composed from the given alphabet) 
which the variable may take. 8 

In these patterns, none of the symbols have any 'hidden' meaning. In par- 
ticular, the symbols '$', '#$', 'L', '#L', 'P' and '#P' (which may be described, 
informally, as 'service' symbols) have exactly the same formal status as other 
symbols ('a', 'b' etc) and enter into matching and unification in the same way. 9 

4.1.1 One step 

Figure El shows the best alignment (in terms of IC) found by SP61 with the 
'input' pattern in New and the other patterns in Old. In the figure, the repeated 
appearances of the pattern '$ L #L $ #$ represent a single instance of 
that pattern in accordance with the generalisation of the multiple alignment 
problem described in Section ?2. 2. II The same is true of the two appearances of 
the pattern 'L b #L'. 

Unification of the matching symbols in FigureJSJhas the effect of 'projecting' 
the alignment into a single sequence, thus: 



P a $ L b #L $ L c #L $ L b #L $ L t #L $ #$ #$ #$ #$ #$ a #P. 



8 Readers will note that the pattern '$ L #L $ #$' contains two 'variables': the pair of 
symbols '$ #$', already considered, and also the pair of symbols 'L #L' which can take any 
of 'a', 'b' or 'c' as its 'value'. 

"Although it is true in general for the ICMAUS framework that all symbols have the same 
status, the principle should perhaps be qualified here since, at the end of each cycle, it is 
proposed below that the 'service' symbols are removed from the 'output' string leaving only 
the 'data' symbols for further processing. 
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a 
I 

1 P a 

2 
3 
4 
5 
6 
7 
8 
9 



#L $ 
I I 



L b #L 



#L $ 
I I 



L c #L 



#L $ 
I I 
L b #L I 
I 

$ L 



#$ a #P 1 
I 



#$ #$ 



#$ #$ 



#$ #$ 
I 

I 



#L $ #$ #$ 
I 

L t #L 



Figure 5: The best alignment (in terms of IC) found by SP61 using the patterns 
shown in Figure 0] as described in the text. 



Notwithstanding the interpolation of instances of 'service' symbols like 'P', '$', 
'L', '#L' etc, this sequence contains the subsequence 'a b c b t' corresponding 
to the 'input' and also the subsequence 'b c b t a' corresponding to the 'output' 
of the first step of our first example of a PCS. 

In this example, the subsequence 'b c b t' corresponding to the contents of 
the 'variable' is shared by the input and the output thus removing the need for 
the rewrite arrow in the PCS. An analogy for this idea is the way Welsh and 
English arc sometimes conflated in bilingual signs in Wales. The phrase 'Cyngor 
Gwynedd' in Welsh means 'Gwynedd Council' in English. So it is natural, in 
bilingual signs put up by Gwynedd Council, to represent both the Welsh and 
the English forms of the name using the words 'Cyngor Gwynedd Council'. In 
this example, the word 'Gwynedd' serves both the Welsh version of the name 
and the English version in much the same manner that, in our example above, 
the contents of the variable serves both the input string and the output string. 

4.1.2 Repetition of steps 

The second and subsequent steps may proceed in the same way. At the end of 
each step, we may suppose that the 'service' symbols are stripped out and the 
sequence from the contents of the 'variable' onwards is presented to the system 
again as New. When 't' reaches the front of the string, there is no production 
with a leading 't' which means that New at that stage cannot be fully matched. 
In this example, this may be taken as a signal that the rotation is complete. 
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New 



1 (Other options for New are described in the text) 
Old 

$ $ #$ 1 #$ 

Figure 6: Patterns for processing by SP61 to model a PCS to produce or recognise 
unary numbers (Section l3,3.H . 

In Minsky's example (which takes too much space to reproduce here), the 
presence at the beginning of the string of 'Tl' (which may be equated with 't' 
in our example) causes other productions to 'fire', leading the processing into 
other paths. 

4.2 Other examples 

This subsection shows how the second and third examples of a PCS which were 
described above may be modelled as ICMAUS. 

4.2.1 Unary numbers and ICMAUS 

Figure El shows two patterns to model the example of a PCS for generating 
unary numbers ( Section 13.3.11 above). The first pattern corresponds with the 
axiom in the PCS and the second one corresponds with the production in the 
PCS. 

If the two symbols '$ #$' in the body of the pattern '$ $ #$ 1 #$' have 
nothing between them, the pattern represents '1' by itself. Otherwise, the first 
and last symbols in the pattern can be matched with the same pair in the body 
of the pattern - and this pair is followed by '1'. Thus the pattern '$ $ =#=$ 1 
may be read as "A unary number is '1' or any unary number followed by '1'". 

If the first of the two patterns is supplied to SP61 as New and the second is 
placed in Old, the program will generate a succession of 'good' alignments, one 
example of which is shown in Figured (a). If it is not stopped, the program 
will continue producing alignments like this until the memory of the machine is 
exhausted. 

If we project the alignment in Figure 0(a) into a single sequence and then 
ignore the 'service' symbols in this example ('$' and '#$'), we can see that the 
system has, in effect, generated the unary number '11111'. We can see from 
this example how the alignment has captured the recursive nature of the unary 
number definition. 

Readers may wonder whether it makes any sense to 'compress' something 
as small as '1' using a relatively large pattern like '$ $ 1 #$' which itself 
contains '1', especially if it is repeated several times in an alignment. Relevant 
discussion may be found in Section 13.31 below. 
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1 

2 
3 

4 $ 
I 

5 $ $ 
(a) 


1 
2 
3 
4 



$ $ 



$ $ 



$ $ 



#$ 1 #$ 
I 

#$ 1 #$ 



#$ 1 #$ 
I 

#$ 1 #$ 




1 
2 
3 

4 



I I 
#$ 1 #$ I 
I I 



#$ 1 #$ 
I 

#$ 1 #$ 
I 

#$ 1 #$ 



#$ 1 #$ 5 



5 $ $ 
(b) 



#$ 1 #$ 5 



Figure 7: (a) One of many 'good' alignments produced by SP61 from the patterns in 
Figure E| (b) The best alignment (in terms of compression) produced by SP61 with 
'1 1 1 1 1' in New and, in Old, the patterns from 'Old' in Figure 
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New 



acbababca 

(Other options for New are described in the text) 
Old 

L a #L 
L b #L 
Lc#L 

LI a #L1 L2 a #L2 
LI b #L1 L2 b #L2 
LI c #L1 L2 c #L2 

$ L #L #$ 

$ LI #L1 $ #$ L2 #L2 #$ 

Figure 8: Patterns for processing by SP61 to model a PCS to produce or recognise 
palindromes (Section 13.3.21 . 

Figure 0(b) shows the best alignment produced by SP61 when '1' in New 
is replaced by '1 1 1 1 1'. This alignment is, in effect, a recognition of the fact 
that '1 f f 1 f ' is a unary number, ft corresponds to the way a PCS may be 
run 'backwards' to recognise input patterns but, since there is no left-to-right 
arrow in the ICMAUS scheme, the notion of 'backwards' processing does not 
apply. Other unary numbers may be recognised in a similar way. 

4.2.2 Palindromes and ICMAUS 

Figure|Hlshows patterns that may be used with SP61 to model the PCS described 
in Section 13.3.21 The first six patterns in Old may be seen as analogues of the 
six axioms in the PCS. The last pattern in Old expresses the recursive nature of 
palindromes. The pattern above it serves to link the single letter patterns (the 
first three in Old) to the recursive pattern. 

Figure ED shows the best alignment produced by SP61 with 'acbababc 
a' in New and, in Old, the patterns under the heading 'Old' in Figure 00 The 
alignment may be seen as a recognition that the pattern in New is indeed a 
palindrome. Other palindromes (using the same alphabet) may be recognised 
in a similar way. 

As with the example of unary numbers, the same patterns from Old may be 
used to generate palindromes. In this case, some kind of 'seed' is required in 
New like a single letter ('a', 'b' or 'c') or a pair of letters ('a a', 'b b' or 'c c'). 
Given one of these 'seeds' in New, the system can generate palindromes until 
the memory of the machine is exhausted. 

Readers may wonder whether the patterns from Old in Figure [3] might be 
simplified. Would it not be possible to substitute '$ a #$' for 'L a #L' (and 
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likewise for the other single letter 'axioms') and to replace 'LI a #L1 L2 a #L2' 
with '$ a $ #$ a #$' (and likewise for the other pairs) and then remove '$ L #L 
#$' and '$ LI #L1 $ #$ L2 #L2 #$'? This is possible but the penalty is that, 
for reasons that would take too much space to explain, searching for the best 
alignments takes very much longer and leads along many more 'false trails'. 

4.3 Information compression and the operation of a PCS 

It should be clear from the foregoing discussion and examples that the process 
of matching a string of symbols with one production in a PCS may always be 
modelled as the formation of an alignment as defined in this research. 

Given the conjecture which provides the inspiration for this research (de- 
scribed at the beginning of Section [2Jl , a pertinent question to ask is: "Does 
every alignment of this kind always yield compression of New in terms of pat- 
terns in Old?" 

The answer to this question depends on the details of how the encoding of 
New in terms of Old is done, including assumptions on which any given method 
is based. It is certainly possible to devise encoding methods where the answer 
to the question, above, would be "no". With the method used in SP61 (outlined 
in Section |2.3[) . the answer is "yes", provided that a key assumption is made, 
described and discussed in the following subsections. 

4.3.1 When an encoding has fewer symbols than New 

Let us consider, first of all, an example where the conclusion is likely to be 
uncontroversial even with very conservative assumptions. Let us suppose that 
New contains the sequence 'alongsequenceofsymbols' and that 
Old contains, amongst others, the pattern 'Plalongsequenceofsy 
mbols #P'. 

With this example, it should be clear that New can be aligned with the 
matching sequence of symbols in the pattern from Old and that New may be 
encoded in terms of this pattern as something like 'P 1 #P'. The encoding is 
much smaller than New in terms of numbers of symbols and, if we assume that 
a fixed number of bits is used for the encoding of every symbol, there is also 
compression in terms of bits of information. 

4.3.2 When an encoding has as many or more symbols than New 

If New contains only one symbol, 'a' and if Old contains patterns like 'P 1 
a #P', 'P 2 b #P', and so on, one would naturally assume that no positive 
compression could be achieved. If 'a' in New is aligned with 'a' in 'P 1 a #P', 
then its encoding would be 'P 1 #P' or something similar. If a fixed number 
of bits is used for the encoding of every symbol, then each of these encodings 
represents negative compression. 

In SP61, it is assumed that every symbol in New is larger than the theoretical 
minimum needed to discriminate one symbol from another - which means, in 
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Figure 9: The best alignment (in terms of compression) produced by SP61 with the 
patterns from Figure |H1 in New and Old, as shown in that figure. 
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effect, that each symbol in New is assumed to contains redundancy. This is 
like assuming that each symbol in New represents a relatively large chunk of 
information like 'j o h n' in the example in Section 12.2.21 The assumption 
that each symbol in New is relatively large is coupled with a calculation of two 
sizes for symbols in New: a notional 'actual' size which is relatively large and a 
'minimum' size which is the size of a code which may be used to represent the 
symbol and is close to the theoretical minimum. For clarity in this discussion, 
each symbol from New in its actual size will be represented directly as 'a', 'b' 
etc, while the code for a symbol from New will be represented with the same 
characters and 'prime', e.g., 'a", 'b" etc. 

Let us suppose that 'a b c b t' in New is aligned with patterns from Old 
as shown in Figure [SJ In this case, the encoding of New in terms of Old would 
be 'P a' b' c' b' t' #P' or something similar. This encoding has more symbols 
than the sequence from New which is being encoded so, on a superficial reading, 
there should be zero or negative compression. But if the 'actual' size of each 
symbol from New is, say, 20, bits, if the 'minimum' size of each symbol from 
New and each of 'P' and '#P' is, say, 4 bits, then the encoding above represents 
a compression of (5 x 20) — (7 x 4) = 72 bits. 

4.4 Redundancy in the operation of digital systems 

The assumption, above, that each symbol in New contains redundancy may 
seem like an ad hoc assumption to salvage the conjecture on which this research 
is based. However, a little reflection may convince us that the assumption has 
some justification. 

4.4.1 Computing with '0' and '1' 

One of the attractions of digital systems is that they are relatively robust in 
the face of noise. If, as is normally the binary alphabet is used, the two 

symbols (normally '0' and '1') are typically represented by two different voltages 
whose values are different enough to ensure that the electronics can discriminate 
them with little or no error through billions of computations. 

Given that the technology is capable of discriminating a much larger number 
of different voltages than two, the use of only two voltages is analogous to the 
use of only 'A' and 'Z' from an alphabet of 26 symbols. Since an alphabet of 
26 symbols requires, in round figures, log 2 26 — 5 bits of information for each 
symbol (assuming that they are equiprobable) , the two symbols 'A' and 'Z' are 
expressed with 5 times as many bits as the theoretical minimum which is 1. 

If the analogy is accepted as valid, then we may conclude that digital comput- 
ing is based on the provision of redundancy in the 'bedrock' of its computations. 
Reducing the redundancy at this level (by bringing the difference in voltages 
closer to the threshold of what the electronics can discriminate) would cause 
errors to increase. Would any computing be possible if all redundancy at this 
level were eliminated? 
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4.4.2 Identifiers, addresses and codes 

If we consider 'higher' levels in the organisation of digital computers, we find 
that there is typically much redundancy in how entities of different kinds are 
identified. 

This is clearly true of identifiers like the names of files, names of tables 
and fields in databases, and the names of functions, variables and so on in 
computer programs. It may be objected with some justice that this redundancy 
exists primarily to serve the psychological needs of humans who must read 
and remember these identifiers and that the identifiers are all converted into 
addresses on disk or in computer memory which is where 'real' computing is 
done. 

But addresses to locations in computer memory also contain redundancy. It 
is rare for any application to use every location which has been provided, so 
more bits are needed to identify each location than are strictly necessary. It 
may be objected, with justice, that this is because, with current technology, we 
cannot provide new memory locations only when required. More to the point, 
perhaps, is that some memory locations are likely to be used more often than 
others but the addressing system does not take advantage of this by assigning 
short addresses to frequently-accessed locations and longer addresses to loca- 
tions which are accessed rarely. Indirect addressing may go partly down this 
road but it is only an approximate solution. 

In a similar way, fixed-length codes for symbols or other entities (e.g., the 
8-bit codes of the ASCII character set) are convenient but are normally quite 
redundant because they do not exploit variations in the frequencies with which 
different codes are used. 

4.4.3 Huffman coding 

Of course, there are well-known techniques for constructing variable-length 
codes to take advantage of variations in frequency. The best-known of these 
is Huffman coding (see |Cover and Thomas 91] ). 

It is often said that the sizes of Huffman codes are at the theoretical mini- 
mum. In one sense, this is true but it is slightly misleading. In order to 'parse' 
a sequence of binary symbols representing a sequence of Huffman codes, it is 
necessary to use the trie which was created alongside the Huffman codes. The 
codes themselves may be minimal but, in themselves, they are unusable. The 
additional information needed to specify the trie may be seen as redundancy 
within the whole system. 

4.4.4 Discussion 

This quick review of the workings of digital computers has shown that redun- 
dancy pervades the system in the way that symbols and other entities are iden- 
tified. And even where it seems that all redundancy has been eliminated, it can 
be argued that it is present in a covert form. 
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Redundancy equates with 'structure'. If a computing system, including its 
programs, is totally lacking in redundancy then it is totally unstructured - which 
means that it is totally random. A system which is totally random and unstruc- 
tured is no system at all. It certainly cannot support the highly structured and 
predictable kinds of information processing that we know as 'computing'. In a 
similar way, meaningful computing depends on the existence of redundancy at 
some level in the input data, even if it is only in the symbolic structure of the 
data. 

These considerations suggest that the deliberate introduction of redundancy 
into the symbols in New in the SP model is more than an ad hoc 'fix' for a 
theoretical anomaly. It seems that computing depends in a fundamental way on 
the existence of redundancy at some level both in the structure of the computer 
and its programs and also in the data to be processed by the computer. 

5 Augmentation of established models of com- 
puting 

Section 0] has described, with examples, how the operation of a PCS in normal 
form may be understood as ICMAUS. Since it is known that any PCS may be 
modelled by a PCS in normal form f jPost 43| . |Minsky 67\ Chapter 13]), we 
may conclude, tentatively, that the operation of any PCS may be interpreted 
in terms of ICMAUS. Since we also know that any UTM may be modelled by 
a PCS Minsk y~67| Chapter 14] we may also conclude, tentatively, that the 
operation of any UTM may be interpreted in terms of ICMAUS. 

Although the operation of a PCS or UTM may understood in terms of IC- 
MAUS, it does not require the full range of capabilities of a well-developed IC- 
MAUS system. Matching in a PCS or UTM is done in a relatively constrained 
manner fSection l2. 4.1(1 which does not require sophisticated search. 

What is envisaged for the proposed SP system is a much more fully devel- 
oped realisation of ICMAUS mechanisms including well-developed abilities for 
heuristic search, for finding 'good' partial matches between patterns and for 
building a wide variety of alignments amongst patterns. 

Figure ITU1 shows a schematic representation of the proposed 'SP' computer 
with a schematic representation of a conventional computer (or UTM or PCS 
etc) for comparison. The next two subsections consider each of these in turn. 

5.1 Conventional computers (UTM, PCS etc) 

Schematically, the conventional computer (or UTM or PCS etc) comprises the 
core machine (C), together with software across a range of applications (S). 

In terms of the ideas described in Sections 2 and 4, we may suppose that 
C in the conventional computer may be interpreted as ICMAUS with relatively 
simple search mechnanisms which operate in a somewhat rigid 'clockwork' man- 
ner. 
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In a PCS, searching and matching occurs when the input string is compared 
with the left-hand sides of productions. In a UTM, searching and matching 
occurs when a symbol newly read from the tape, together with a symbol repre- 
senting the state of the read-write head, is compared with the left-hand part of 
each of several tuples in the transition function. Where exact matches are found, 
the right-hand part(s) of the corresponding tuples control (in a deterministic or 
non-deterministic manner) what happens next. 

In a conventional computer, one manifestation of searching and matching is 
the process by which an item in memory is accessed using the address of that 
item. This may be seen as a process of searching through addresses in memory 
until a match is found for the given address. Accessing an item in memory 
is served by logic circuits which, with great speed, eliminate areas of memory 
progressively until the item is found. 

This kind of highly constrained form of searching and matching has advan- 
tages for certain kinds of problem which can be solved with great accuracy and 
speed. The penalty is that other kinds of problem which require more flexibility 
cannot be solved unless additional mechanism is provided in S. 

Thus, owing to the limitations of C in the conventional computer, many 
applications within S contain additional ICMAUS mechanisms designed to make 
good the deficiencies of the core mechanisms. Similar mechanisms are provided 
repeatedly in diverse applications. For example, search in various forms is found 
in database systems, expert systems, parsers, compilers, text editors and DTP 
software (especially software for spelling checking and correction), spreadsheets, 
genetic algorithms of various kinds, data mining software, and many other more 
or less specialised kinds of software. Thus there is much more of this kind of 
repetition than is suggested by the three instances of ICMAUS shown in the 'S' 
part of the conventional computer as portrayed in Figure ITU1 

As already indicated, ICMAUS mechanisms provided in C of the conven- 
tional computer do not exploit the full potential of this type of mechanism. 
The ICMAUS mechanisms in S are also incomplete because, in general, each 
one provides only what is required for a specific application and, out of those 
requirements, they normally provide only what is missing from C. Hence, to 
show that these instances of ICMAUS mechanisms in C and S of the conven- 
tional computer are not fully developed, they are represented in Figure HCI using 
relatively small type. 

What else is in S? In general, the parts of S which are not ICMAUS mech- 
anisms are domain-specific knowledge about such things as the rules of ac- 
counting, the classes of objects found in a particular organisation (each with its 
associated data structures and methods), the laws of thermodynamics, aerody- 
namics or topology, and many other concepts across all the many areas where 
computers can be applied. 

5.2 The proposed 'SP' computer 

In the proposed SP computer, the core mechanism (C) comprises ICMAUS 
mechanisms developed as fully as our understanding and available resources 
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allow. Although C in the SP computer is probably larger and more complex than 
C in the conventional computer, the anticipated pay-offs are the elimination of 
repeated instances of ICMAUS in S giving an overall simplification of the system 
(C + S) and better integration across applications. 

The key difference between ICMAUS in C in a conventional computer and 
ICMAUS in C of an SP computer seems to be that, without additional program- 
ming, the former is restricted to exact matches between sequences of symbols 
whereas the latter should have a well-developed capability for finding good par- 
tial matches between sequences of symbols or, in advanced versions, good partial 
matches between arrays of symbols in two or more dimensions. The core of the 
projected SP computer should also contain mechanisms for building multiple 
alignments of the kind described in Section 0] 

5.2.1 Constraining the SP computer to behave like a conventional 
computer 

Although the SP computer should be able to find good partial matches between 
arrays of symbols, it should be possible to restrict it to exact matches between 
sequences of symbols if that is required for any particular application. Thus, 
with suitable constraints (see Section P2.4.1[l . the SP computer may be made to 
function, when required, as if it was a conventional computer. 

5.3 Minimum Length Encoding in the design of comput- 
ers? 

The principles just described are similar to the principle of Minimum Length 
Encoding in grammar induction (described in Section I2.1J1 . 

In the case of grammar induction, the principle requires a search for a gram- 
mar which minimises the size of the grammar together with the size of the 
sample when it is encoded in terms of the grammar. This means compressing 
the sample and making sure that the grammar which is derived from the sample 
is counted as part of the 'cost' of compression. 

In the case of computer design, we are aiming to compress the system in- 
cluding both the 'core' of the computer and all its software. Very roughly, we 
may equate C in Figure 1101 with the grammar in grammar induction and we 
may equate S in computers with the data in grammar induction. In the same 
way that an increase in the size of a grammar can mean an overall decrease in 
the size of grammar and data taken together, an increase in the size of C can 
mean an overall reduction in (C + S), as suggested in Figure ITU1 

5.4 Extending established ideas 

The principles illustrated in Figure ITUl are not new. The same kinds of principle 
provide the motivation for the creation of database management systems or ex- 
pert system shells. In this type of application, and others, a system is created 
which can be used as the foundation for a range of applications of one kind, 
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Figure 10: Schematic representation of a conventional computer and the proposed 'SP' 
computer in terms of the ICMAUS proposals. Key: C = the 'core' of the conventional 
computer and of the SP computer; ICMAUS (in small type) = ICMAUS mechanisms 
(described in Section|2Jl in various relatively restricted forms. ICMAUS (in large type) 
= ICMAUS mechanisms in a relatively fully developed and generic form; S = software 
to cover a wide range of applications. In the conventional computer, S includes many 
instances of (small) ICMAUS - more than the three instances shown here. It is assumed 
that the data to be processed is the same for both computers. 
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thus saving the need to reprogram the basic mechanisms in each application. In 
accordance with remarks above about cost and quality, the use of database man- 
agement systems and expert system shells can mean dramatic savings in costs 
and gains in quality compared with programming each application individually. 

What is different about the present proposals is that their anticipated scope 
is much wider than the scope of a database management system or of an expert 
system shell. Evidence of the possible scope of the system is provided in the 
sources cited in Section IT~2l 

5.5 Design of an SP computer 

Given current technology, the most straightforward way to create an SP com- 
puter is to program it as a software virtual machine running on a conventional 
computer. Although the SP61 model has been created in this way as a test-bed 
and demonstrator for theoretical concepts, it is also, in effect, a realisation of 
an SP computer. 

On longer time-scales, creating an SP computer in this way is almost cer- 
tainly not the best way to do it. Reasons why this should be so are considered 
in the next two subsections. 

5.5.1 New hardware needed 

In the 'core' of a conventional computer (as in the basic mechanisms of the 
UTM and PCS), mechanisms for searching and matching are not programmed. 
In the case of the UTM and the PCS, the mechanisms are part of the definition. 
In digital computers, notwithstanding the existence of 'microcode', core mech- 
anisms are, in general, provided by dedicated hardware like the logic circuits 
mentioned earlier for accessing an item in memory by means of its address. 

In a similar way, it is not necessary for the search mechanisms in an SP 
computer to be programmed. Creating these mechanisms by programming a 
conventional computer is a bit like cleaning the bathroom with a toothbrush 
(as Goldie Hawn was required to do in "Private Benjamin"). It can be done 
but the basic tools are clearly not designed for the job. In the long run, it is 
likely that hardware techniques may be found which meet the requirements of 
the system more directly. 

New hardware does not necessarily mean new electronic hardware. There is 
potential for flexible pattern-matching using light with high levels of parallelism 
(see next), especially since light rays can cross each other without interfering 
with each other. As noted previously (Section 12. 7. Ill , there is also potential 
for searching huge pattern-matching search spaces in parallel using chemicals in 
solution (or other form), as, for example, in DNA computing. 

5.5.2 Parallel processing 

Although the estimated time complexity and space complexity of the current 
SP model are good, the ICMAUS mechanisms are intrinsically 'hungry' for com- 
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puting power unless they arc constrained to work like a conventional computer. 
One possible solution to this problem is the creation of an SP system as a 
software virtual machine running on a high-parallel computer comprising many 
(cheap) conventional processors. This approach is perhaps more promising than 
a software virtual machine on a single processor. Returning to the toothbrush 
analogy, it would be like a cleaning device comprising thousands of toothbrushes 
- conceivably quite effective. 

Alternatives, as before, include new devices based on electronics, light or 
chemistry, or some combination of those mechanisms. 

6 Conclusion 

This article has presented the 'SP' concepts in outline and has tried to show how 
established models of computing may be interpreted in terms of information 
compression by multiple alignment, unification and search. It has also been 
argued that established models may be augmented to exploit more fully the 
potential functionality of the ICMAUS mechanisms. 

Although this augmentation would increase the size and complexity of the 
core computational process, research to date suggests that there could be a 
reduction in the size and complexity of software - and this could mean an overall 
saving in the size and complexity of the combination of core system and software. 
In terms of practicalities, less software should mean lower costs and fewer errors 
in working systems. There should also be better integration across diverse 
applications. 

In accordance with 'Occam's razor' (and, indeed, with the MLE principle 
incorporated in the SP theory), a good theory should combine 'Simplicity' with 
explanatory or descriptive 'Power'. Established models of computing are very 
simple but they leave many problems unsolved. The conceptual framework de- 
scribed in this article seems to provide an interpretation for established models. 
At the same time, it suggests how these models may, with advantage, be en- 
larged. The resulting reduction in Simplicity may be more than offset by an 
increase in descriptive and explanatory Power. 
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A Definitions of concepts 



A.l Basic concepts 
Definition 1 

A symbol is some kind of mark which can be compared with any other symbol. 
In the context of pattern matching, a symbol is the smallest unit which can 
participate in matching: a symbol can be compared (matched) only with another 
single symbol and the result of matching is either that the two symbols are the 
same or that they are different. No other result is permitted. 

An important feature of the concept of a symbol, as it is used in this research, 
is that, with respect to the way it behaves within the system, it has no hidden 
meaning. In this research, a symbol is nothing but a primitive mark which can 
be discriminated in a yes/no manner from other symbols. There are no symbols 
like the symbols in an arithmetic function (e.g., '6', '22', '+', 'x', '/', '(', ')' 
etc), each of which has a meaning for the system which is not directly visible. 

Any symbol may have a meaning but that meaning must take the form of 
one or more other symbols which are associated with the given symbol and are 
explicit and visible within the structure of symbols and patterns. 

Although symbols in the proposed framework have no hidden meaning hav- 
ing an effect on how the symbol behaves in the system, they can and often do 
have a meaning for the human reader. 

If two symbols match, we say that they belong to the same symbol type. 
In any system which contains symbols, we normally recognise an alphabet of 
symbol types such that every symbol in the system belongs in one and only one 
of the symbol types in the alphabet, and every symbol type is represented at 
least once in the system. 

A positive match between two symbols is termed a hit. In an alignment of 
two more patterns, one or more unmatched symbols within one pattern between 
two hits or between one hit and the end of the pattern is a gap. 

Definition 2 

A pattern is an array of symbols in one, two or more dimensions. In this 
article, one dimensional patterns (sequences or strings of symbols) are the 
main focus of attention. 

The meaning of the term pattern includes the meanings of the terms substring 
and subsequence, defined next. 

Definition 3 

A substring is a sequence of symbols of length n within a sequence of length m, 
where n ^ m and where the constituent symbols in the substring are contiguous 
within the sequence which contains the substring. 



38 



Definition 4 



A subsequence is a sequence of symbols of length n within a sequence of length 
to, where n $C m and where the constituent symbols in the subsequence may 
not be contiguous within the sequence which contains the subsequence. The 
set of all subsequences of a given sequence includes all the substrings of that 
sequence. 

A. 2 The Post Canonical System in normal form 
Definition 5 

A Post Canonical System in normal form is a quadruple PCS = (A, I, P, 

S), where A is a finite set (alphabet) of one or more symbols (which Post calls 
'letters'), I is an 'axiom' or 'input' comprising a finite string of one or more 
symbols (drawn with replacement from A), P is a finite set of one or more 
'productions' and S is a method of searching for a match between a string of 
symbols and one or more of the productions. 10 
Each production has the form: 

g $ -> $h, 

where g and h each represent a string of zero or more symbols (drawn with 
replacement from S), and both instances of '$' represent a single 'variable' which 
may have a value comprising a string of zero or more symbols. 

A PCS (in normal form) operates by searching (using the method S) for one 
or more exact matches between the leading substrings in I and the string g in one 
or more of the productions (P). Wherever an exact match is found, the trailing 
substring (if any) is assigned to the variable, $. The string which comprises the 
value of '$' followed by the string h (which may be termed 'output') is then 
treated as new input, and matches are sought for this input as before. The 
process terminates if and when no more matches can be found. 

A key characteristic of S is that, for any 'input' string (i.e., the original input 
string or one of the strings formed by the system which are processed in the 
same way), it should be able to find every production where the string g in the 
production is an exact match for a leading substring in the input string. 

Comments 

Notice that any search for a match between a symbol string and the leading 
symbols of productions can yield more than one positive result - and a given 

10 The definition of a PCS given here is drawn from the descriptions in Post 43 and 
|Minsky 67| except that S has been identified explicitly as an element of the system. In 
both Post's and Minsky's accounts, the process of searching for matches between strings and 
substrings is not formally identified as an element of the system and the descriptions which 
they give of S are informal. The reason for identifying S explicitly in this article is to empha- 
sise what appears to be the key difference between a PCS (or UTM) and generalised forms of 
the SP system, as discussed in Section l5l 
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computation may develop two or more 'strands' with further branching at any 
stage (see |Minsky 67| Chapter 13]. Thus PCSs can accommodate both 'deter- 
ministic' and 'non-deterministic' styles of computing. 

A.3 The SP System 
Definition 6 

An SP System is a quadruple SPS = (A', N, O, S'), where A' is a finite set 
(alphabet) of one or more primitive symbols, N is a pattern of symbols ('New'), 
O is a finite set of one or more patterns of symbols ('Old'), and S' is a search 
method, different from and, in some sense, more sophisticated than S in a PCS 
(see below). 

Using S', the system operates by searching for one or more alignments (de- 
fined next), where each alignment contains N and one or more patterns from 
O and where each alignment promotes the compression of N by unification of 
matching symbols in the alignment. 

Definition 7 

In the case of one-dimensional patterns, 11 an alignment is a two-dimensional 
array of one or more patterns, each one in a separate row in the array. The 
alignment shows sets of two or more matching symbols by arranging the symbols 
in each set in a column of the array. 12 In an alignment, as defined in this 
research: 

• Symbols which are contiguous in a pattern which appears in an alignment, 
need not occupy contiguous cells in the array. 

• Any one pattern may appear one or more times in an alignment. 

• Where a pattern appears two or more times in an alignment, no symbol 
in one appearance of the pattern should ever be shown as matching the 
same symbol in another appearance of the pattern. 

• Any symbol in one pattern may be placed in the same column as any 
other symbol from the same pattern or another pattern, providing order 
constraints are not violated. 

For any alignment, order constraints are preserved if the following state- 
ment is always true: 

11 As previously noted, the concept of an alignment can be generalised in a straightforward 
manner to patterns of two or more dimensions. But no attempt is made here to provide a 
formal definition for alignments of patterns of two dimensions or higher. 

12 The fact that, in displaying alignments, it can sometimes be convenient to put non- 
matching symbols in the same column with lines to mark the symbols that do match (as in 
Figure^ is not relevant to the abstract definition of an alignment presented here. 
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For any two rows in the alignment, A and B, and any four symbols, 
Ai and A2 in A, and Bi and B2 in B, if Ai is in the same column 
as Bi, and if A 2 is in the same column as B 2 , and if A 2 in A follows 
Ai in A, then B 2 in B must follow Bi in B. 

This condition holds when the two rows contain two different patterns and also 
when the two rows contain two appearances of one pattern. 

Definition 8 

A mismatch in an alignment occurs when, between to columns in the align- 
ment containing hits, or between one column containing hits and the end of the 
alignment, there are no other columns containing hits and there are two more 
columns containing single symbols from two or more different patterns in Old. 

Definition 9 

The search space of an SPS is the set of possible alignments of N and the 
patterns in O. 

Definition 10 

An alignment is equivalent to one step in the operation of a PCS if: 

• The alignment contains no mismatches. 

• Every symbol in I of the PCS can be mapped to an identical symbol in 
the pattern N in the alignment. 

• The order of the symbols in I is the same as the order of the matching 
symbols in N. 

• For the production in the PCS which enters into the given operation, 
every symbol in that production can be mapped to an identical symbol 
in a pattern from O which appears in the alignment, and the order of 
the symbols in the production are the same as the order of the matching 
symbols in the corresponding pattern. 

• For every positive match between two symbols in the operation of the PCS 
there is a column in the alignment containing a matching pair of symbols. 
The order of these columns in the alignment must be the same as the order 
of the corresponding pairs of symbols in the operation of the PCS. 
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